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BACKGROUND AND HISTORY OF PESTICIDAL 
CRYSTAL PROTEIN NOMENCLATURE 

Since the first cloning of an insecticidal crystal protein gene 
from Bacillus thuringiensis (91), many other such genes have 
been isolated. Initially, each newly characterized gene or pro- 
tein received an arbitrary designation from its discoverers: icp 
(64); cry (21, 121); kurhdl (31); Bta (88); btl, bt2, etc. (40); 
type B and type C (43); and 4.5 kb, 5.3 kb, and 6.6 kb (55). The 
first systematic attempt to organize the genetic nomenclature 
relied on the insecticidal activities of crystal proteins for the 
primary ranking of their corresponding genes (44). The cryl 
genes encoded proteins toxic to lepidopterans; cryll genes en- 
coded proteins toxic to both lepidopterans and dipterans; crylll 
genes encoded proteins toxic to coleopterans; and cry/K genes 
encoded proteins toxic to dipterans alone. 

This system provided a useful framework for classifying the 
ever-expanding set of known genes. Inconsistencies existed in 
the original scheme, however, due to attempts to accommo- 
date genes that were highly homologous to known genes but 
did not encode a toxin with a similar insecticidal spectrum. The 
cryllB gene, for example, received a place in the lepidopteran- 
dipteran class with cryllA, even though toxicity against dipter- 
ans could not be demonstrated for the toxin designated 
CryllB. Other anomalies arose after the nomenclature was 
established. The protein named CrylC, for example, was re- 
ported to be toxic to both dipterans and lepidopterans (103), 
while the protein designated CrylB was reported to be toxic to 
both lepidopterans and coleopterans (8). Because the nomen- 
clature system provided no central committee or database to 
maintain standardization, new genes encoding a diverse set of 
proteins without a common insecticidal activity each received 
the name cryV, based on the next available Roman numeral 
(32, 46, 67, 100, 102, 108). 

PROPOSED NOMENCLATURE 

We propose in this review a revised nomenclature for the cry 
and cyt genes. To organize the wealth of data produced by 
genomic sequencing efforts, a new nomenclatural paradigm is 
emerging, exemplified by the internationally recognized cyto- 
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chrome P-450 superfamily nomenclature system (68a, 122a). 
Our proposal conforms closely to this model both in concep- 
tual basis and in nomenclature format. The underlying basis of 
this type of system is to assign names to members of gene 
superfamilies according to their degree of evolutionary diver- 
gence as estimated by phylogenetic tree algorithms. The no- 
menclature format in such a system is designed to convey rich 
informational content about these relationships by appending 
to the mnemonic root a series of numerals and letters assigned 
in a hierarchical fashion to indicate degrees of phylogenetic 
divergence. This change from a function-based to a sequence- 
based nomenclature allows closely related toxins to be ranked 
together and removes the necessity for researchers to bioassay 
each new protein against a growing series of organisms before 
assigning it a name. 

In our proposed revision, Roman numerals have been ex- 
changed for Arabic numerals in the primary rank (e.g., 
CrylAa) to better accommodate the large number of expected 
new proteins. The mnemonic Cyt to designate crystal proteins 
showing a general cytolytic activity in vitro has been retained 
because of its historical precedent and entrenchment in the 
research literature. Our definition of a Cry protein is rather 
broad: a parasporal inclusion (crystal) protein from B. thurin- 
giensis that exhibits some experimentally verifiable toxic effect 
to a target organism, or any protein that has obvious sequence 
similarity to a known Cry protein. Similarly, Cyt denotes a 
parasporal inclusion (crystal) protein from B. thuringiensis that 
exhibits hemolytic activity, or any protein that has obvious 
sequence similarity to a known Cyt protein. By these criteria, 
the nontoxic 40-kDa crystal protein from B. thuringiensis subsp. 
thompsoni, for example, has been excluded from our list, but 
the lepidopteran-active 34-kDa protein (now Cryl5A) en- 
coded by an adjacent gene has been included (11). 

The freely available software applications CLUSTAL W 
(110) and PHYLIP (27) define the sequence relationships 
among the toxins to form the framework of the new nomen- 
clature. In the first step, CLUSTAL W aligns the deduced 
amino acid sequences of the full-length toxins and produces a 
distance matrix, quantitating the sequence similarities among 
the set of toxins. CLUSTAL W default settings are employed, 
except that the "delay divergent sequences" setting in the mul- 
tiple-alignment parameter menu is reduced from 40 to 0%. 
The NEIGHBOR application within the PHYLIP package 
then constructs a phylogenetic tree from the distance matrix by 
an unweighted pair-group method using arithmetic averages 
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(UPGMA) algorithm. The TREEVIEW application (73), with 
the "phylogenetic tree" and "ladderize left" options selected, 
produces a graphic presentation of the resulting tree. 

We have applied this procedure to the set of holotype se- 
quences given in Table 1 to produce the phylogenetic tree 
presented in Fig. 1. Vertical lines drawn through the tree show 
the boundaries used to define the various nomenclatural ranks. 
The name given to any particular toxin depends on the location 
of the node where the toxin enters the tree relative to these 
boundaries. A new toxin that joins the tree to the left of the 
leftmost boundary will be assigned a new primary rank (an 
Arabic number). A toxin that enters the tree between the left 
and central boundaries will be assigned a new secondary rank 
(an uppercase letter). It will have the same primary rank as the 
other toxins within that cluster. A toxin that enters the tree 
between the central and right boundaries will be assigned a 
new tertiary rank (a lowercase letter). Finally, a toxin that joins 
the tree to the right of the rightmost boundary will be assigned 
a new quaternary rank (another Arabic number). Toxins with 
identical sequences but isolated independently will receive sep- 
arate quaternary ranks. 

By this method each toxin will be assigned a unique name 
incorporating all four ranks. A completely novel toxin would 
currently be assigned the name Cry23Aal. For the sake of 
convenience, however, we propose that the inclusion of the 
tertiary rank a and quaternary rank 1 be optional, their use 
dictated only by a need for clarity. This new toxin could there- 
fore simply be referred to as Cry23A. 

In choosing locations for rank boundaries, we attempted to 
construct a nomenclature reflecting significant evolutionary 
relationships while at the same time minimizing changes from 
the gene names assigned under the old system. In the resulting 
system, proteins with a common primary rank are similar 
enough that the percent identity can be defined with some 
confidence. Proteins with the same primary rank often affect 
the same order of insect; those with different secondary and 
tertiary ranks may have altered potency and targeting within an 
order. At the tertiary rank, differences can be due to the ac- 
cumulation of dispersed point mutations, but often they appear 
to have resulted from ancestral recombination events between 
genes differing at a lower rank level (9). The quaternary rank 
was established to group "alleles" of genes coding for known 
toxins that differ only slightly, either because of a few muta- 
tional changes or an imprecision in sequencing. To avoid con- 
fusion, however, the reader should bear in mind the differences 
between the quaternary rank number and the classical concept 
of the allele. Any cry gene specified with a quaternary rank is 
a natural isolate. No assumption about functionality is implied 
by the presence of this rank number in the gene name. In 
contrast, an allele number would be assumed, unless paren- 
thetical or subscripted information indicated otherwise, to de- 
note a nonfunctional mutant form of a wild-type gene found at 
a discrete genetic locus. Because of the somewhat modular 
nature of the Cry proteins and the effect that various segmental 
relationships could have on the clustering algorithm, it is likely 
that these boundaries will move slightly or even bend as the 
addition of new sequences changes the topology of the phylo- 
genetic tree. Currently the boundaries represent approxi- 
mately 95, 78, and 45% sequence identity. 

A B, thuringiensis Pesticidal Crystal Protein Nomenclature 
Committee, consisting of the authors of this paper, will remain 
as a standing committee of the Bacillus Genetic Stock Center 
(BGSC) to assist workers in the field of 5. thuringiensis genetics 
in assigning names to new Cry and Cyt toxins. The correspond- 
ing gene or protein sequences must first be deposited into a 
publicly accessible database (GenBank, EMBL, or PIR) and 



released by the repository for electronic publication in the 
database so that the scientific community may conduct an 
independent analysis. Researchers should submit new se- 
quences directly to the BGSC director (D. R. Zeigler), either 
by electronic mail (zeigler.l@osu.edu) or on computer dis- 
kette. The director will analyze the amino acid sequence as 
described above and suggest the appropriate name, subject to 
the approval of the committee. The committee will periodically 
review the literature of the Cry and Cyt toxins and publish a 
comprehensive list. This list, alongside other relevant informa- 
tion, will also be available via the Internet at the following 
URL: http:/Avww.biols.susx.ac.uk/Home/Neil_Crickmore/Bt/. 

The current list of cry and cyt genes (including quaternary 
ranks) is given in Table 1. New gene names are listed with their 
previous names, their GenBank accession numbers, and pub- 
lished references. The quaternary ranks were assigned in the 
order that the gene sequences were discovered in the literature 
or submitted to the committee. Genes assigned the quaternary 
rank 1 represent holotype sequences. 

The boundaries shown in Fig. 1 allow most cry genes to 
retain the names they received under the system of Hofte and 
Whiteley (44), after a substitution of Arabic for Roman nu- 
merals. There are a few notable exceptions: crylG becomes 
cry9A, crylllC becomes cryJAa, crylllD becomes cry3C, crylVC 
becomes crylOA, crylVD becomes cryllA, cytA becomes cytJA, 
and cytB becomes cyt2A (Table 1). Under the revised system, 
the known Cry and Cyt proteins fall into 24 sets at the primary 
rank— Cytl, Cyt2, and Cryl through Cry22. 

ROBUSTNESS OF THE NOMENCLATURE 

The robustness of the current naming process was assessed 
by a number of additional analyses. The choice of clustering 
algorithm (unweighted pair-group method using arithmetic av- 
erages) was driven largely by the consistent location of a root 
and constant branch lengths, resulting in a common vertical 
alignment of sequence names and essentially allowing a "ruler 
across the tree" approach to naming. It has the drawback of 
imposing a common evolutionary clock on the clustering pro- 
cess, an assumption that cannot be assured. The distance met- 
ric related to percent identity (essentially 1 minus the fraction 
of identical residues of the total compared without gaps) is the 
one most commonly found as the output of sequence compar- 
ison programs, including CLUSTAL W. For phylogenetic anal- 
ysis, a more usual distance metric relates to the number of 
substitutions per site to convert one sequence to the other 
(e.g., Dayhoff's point accepted mutation [PAM]) and accounts 
for the possibility of multiple substitutions per site as the se- 
quences are more divergent. The latter method has the draw- 
back of being more computationally intensive, and, for very 
divergent sequences, requiring too large a value, resulting in 
numeric computation failures. They also differ in the way se- 
quences of unequal length are handled, with the percent iden- 
tity method typically ignoring excess sequence and the other 
methods assigning a penalty. This is particularly important for 
crystal proteins, since a number of them lack the C-terminal 
protoxin segments yet are quite related to some longer toxins 
in the N-terminal toxin segment; we feel that the stronger 
association of such relationships found by the percent identity 
method is preferred. 

To assess the effect of using the neighbor-joining method to 
generate an unrooted tree, CLUSTAL W routines were used 
to generate such a tree with 1,000 bootstraps of the sequence 
alignment we used for Fig. 1. When an appropriate outgroup 
was chosen, the resulting tree (not shown) resembled our Fig. 
1. The bootstrap values indicated that the tree thus generated 
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TABLE 1. Known cry and cyt gene sequences with revised nomenclature assignments 



Revised 
gene name 


Original gene or 
protein name 


Accession 
no. 


Coding 
region" 


Keference 


crylAal 


cryIA{a) 


Ml 1250 


527-4054 


92 


crylAal 


cryIA{a) 


M10917 


153->2955 


98 


crylAaS 


cryIA{a) 


D00348 


73-3600 


99 


crylAa4 


crylA{a) 


X13535 


1-3528 


62 


crylAaS 


crylA{a) 


D17518 


81-3608 


113 


crylAad 


cryL4{a) 


U43605 


1->1860 


63 


crylAbl 


crylA(b) 


Ml 3898 


142-3606 


119 


crylAb2 


crylAlb) 


M12661 


155-3622 


111 


crylAbS 


cryIA{b) 


M15271 


156-3620 


31 


crylAb4 


cryIA{b) 


D00117 


163-3627 


50 


crylAbS 


cryIA{b) 


X04698 


141-3605 


40 


cjylAbd 


crylAlb) 


M37263 


73-3537 


37 


crylAbl 


crylAlb) 


X13233 


1-3465 


36 


crylAbS 


crylA{b) 


Ml 6463 


157-3621 


69 


crylAb9 


crylAlb) 


X54939 


73-3537 


13 


crylAblO 


crylA{b) 


A29125 




28 


crylAcl 


cryLilc) 


Ml 1068 


388-3921 


3 


crylAc2 


crylA(c) 


M35524 


239-3769 


117 


crylAcS 


cryIA(c) 


X54159 


339->2192 


18 


crylAc4 


cryIA{c) 


M73249 


1-3534 


84 


crylAcS 


cryIA{c) 


M73248 


1-3531 


83 


crylAcO 


cryIA(c) 


U43606 


1->1821 


63 


crylAcJ 


cryIA{c) 


U87793 


976-4509 


38 


crylAcS 


crylA^c) 


U87397 


153-3686 


71 


cryIAc9 


cryIA{c) 


U89872 


388-3921 


33 


crylAclO 




AJ002514 


388-3921 


107 


crylAdl 


crylA{c) 


M73250 


1-3537 


79 


crylAel 


crylA{e) 


M65252 


81-3623 


60 


crylAfl 


icp 


U82003 


172->2905 


49 


crylBal 


crylB 


X06711 


1-3684 


10 


crylBa2 


X95704 


186-3869 


105 


crylBbl 


ETS 


L32020 


67-3753 


25 


crylBcl 


cryIB{c) 


Z46442 


141-3839 


6 


crylBdl 


cryEI 


U70726 




12 


crylCal 


crylC 


X07518 


47-3613 


45 


crylCa2 


crylC 


X13620 


241->2711 


88 


crylCaS 


crylC 


M73251 


1-3570 


79 


cryICa4 


crylC 


A27642 


234-3800 


114 


crylCaS 


crylC 


X96682 


l->2268 


106 


crylCad 


crylC 


X96683 


l->2268 


106 


crylCa? 


crylC 


X96684 


l->2268 


106 


crylCbl 


crylCip) 


M97880 


296-3823 


48 


crylDal 


crylD 


X54160 


264-3758 


42 


cryWbl 


prtB 


Z22511 


241-3720 


56 


crylEal 


crylE 


X53985 


130-3642 


115 


crylEa2 


crylE 


X56144 


1-3513 


7 


crylEaS 


crylE 


M73252 


1-3513 


82 


crylEa4 


U94323 


388-3900 


47 


crylEbl 


crylE(b) 


M73253 


1-3522 


81 


crylFal 


crylF 


M63897 


478-3999 


14 


crylFa2 


crylF 


M73254 


1-3525 


80 


crylFbl 


prtD 


Z22512 


483-4004 


56 


crylGal 


prtA 


Z22510 


67-3564 


56 


crylGa2 


crylM 


Y09326 


692-4210 


96 


crylGbl 


cryH2 


U70725 




12 


crylHal 


prtC 


Z22513 


530-4045 


56 


crylHbl 


U35780 


728-4195 


53 


cry Hal 


cryV 


X62821 


355-2511 


108 


crylla2 


cryV 


M98544 


1-2157 


34 


cryllaS 


cryV 


U6338 


279-2435 


100 


crylla4 


cryV 




Dl— Zzl / 


j4 


cryilaS 


cryV159 


Y08920 


524-2680 


94 


cryllbl 


cryV465 


U07642 


237-2393 


100 


crylJal 


ET4 


L32019 


99-3519 


25 


crylJbl 


ETl 


U31527 


177-3686 


116 


crylKal 




U28801 


451^098 


52 


crylAal 


cryllA 


M31738 


156-2054 


20 


cry2Aa2 


cryllA 


M23723 


1840-3738 


123 


cry2Aa3 


D86064 


2007-3911 


89 


cry2Abl 


cryllB 


M23724 


1-1899 


123 



Revised 
gene name 


Original gene or 
protein name 


Accession 
no. 


21 25-3 990 > 


Referenc 


cry2Ab2 


cryllB 


X55416 


S14-2115 


17 


cry2Acl 


cryllC 


X57252 


2125-3990 


124 


crySAal 


crylllA 


M22472 


25-1956 


39 


cry3Aa2 


crylllA 


J02978 


241-2172 


93 


cry3Aa3 


crylJlA 


Y00420 


566-2497 


41 


cry3Aa4 


crylllA 


M30503 


201-2132 


65 


cry3Aa5 


crylllA 


M37207 


569-2500 


22 


cry3Aa6 


crylllA 


U10985 


569-2500 


1 


cry3Bal 


cryIIlB2 


X17123 


25->1977 


101 


cry3Ba2 


crylllB 


A07234 


342-2297 


85 


cry3Bbl 


crylUBb 


M89794 


202-2157 


24 


cry3Bb2 


cryIIlC{b) 


U31633 


144-2099 


23 


cry3Cal 


crylllD 


X59797 


232-2178 


59 


cry4Aal 


crylVA 


Y00423 


1-3540 


121 


cry4Aa2 


cryJVA 


D00248 


393-3935 


95 


cry4Bal 


crylVB 


X07423 


157-3564 


16 


cry4Ba2 


crylVB 


X07082 


151-3558 


112 


cry4Ba3 


crylVB 


M20242 


526-3930 


125 


cry4Ba4 


crylVB 


D00247 


461-3865 


95 


crySAal 


cryVA{a) 


L07025 


1->4155 


102 


crySAbl 


cryVAib) 


L07026 


l->3867 


67 


crySAcl 




134543 


l->3660 


76 


crySBal 


PS86Q3 


U19725 


l->3735 


76 


cry6Aa} 


cryVlA 


L07022 


1->1425 


68 


cry6Bal 


cryVlB 


L07024 


1->1185 


67 


cryJAal 


crylliC 


M64478 


184-3597 


58 


cryJAbl 


cryJIIC{b) 


U04367 


1->3414 


75 


cry7Ab2 


cryIIIC{c) 


U04368 


1->3414 


75 


crySAal 


crylllE 


U04364 


1->3471 


29 


crySBal 


crylllG 


U04365 


l->3507 


66 


crySCal 


crylllF 


U04366 


1-3447 


70 


cryQAaJ 


crylG 


X58120 


5807-9274 


104 


cry9Aa2 


crylG 


X58534 


385->3837 


32 


cry9Bal 


cryX 


X75019 


26-3488 


97 


cry9Cal 


crylH 


Z37527 


2096-5569 


57 


cry9Dal 


N141 


D85560 


47-3553 


4 


cry9Da2 




AF042733 


<1->1937 


122 


crylOAal 


crylVC 


M12662 


941-2965 


111 


cryllAal 


crylVD 


M31737 


41-1969 


21 


cryllAa2 


crylVD 


M22860 


< 1-235 


2 


cryllBal 


Jeg80 


X86902 


64-2238 


19 


cryllBbl 


94 kDa 


AF017416 




72 


crylZial 


cryVB 


L07027 


1->3771 


67 


cryl3Aal 


cryVC 


L07023 


1-2409 


90 


cryl4Aal 


cryVD 


U13955 


1-3558 


77 


crylSAal 


34kDa 


M76442 


1036-2055 


11 


cryl6Aal 


cbmll 


X94146 


158-1996 


5 


crylVAai 


cbm72 


X99478 


12-1865 


5 


crylSAal 


cryBPl 


X99049 


743-2860 


126 


cryWAal 


Jeg65 


Y07603 


719-2662 


86 


cryl9Bal 




D88381 




87 


cry20Aal 


86kDa 


U82518 


60-2318 


61 


cry21Aal 




132932 


1-3501 


74 


cry22Aal 




134547 


1-2169 


76 


cytlAal 


cytA 


X03182 


1 /in oo£ 
14U-000 


1 1 Q 


cytlAa2 


cytA 


X04338 


509-1255 


120 


cytlAa3 


cytA 


Y00135 


36-782 


26 


cytlAa4 


cytA 


M35968 


67-813 


30 


cytlAbl 


cytM 


X98793 


28-777 


109 


cytlBal 




U37196 


1-795 


78 


cyt2Aal 


cytB 


Z14147 


270-1046 


51 


cyt2Bal 


"cytB" 


U52043 


287-655 


35 


cyt2Bbl 




U82519 


416-1204 


15 



" The symbols < and > indicate that the coding region extends up- or downstream, respectively, from the known sequence data. 
* Only the polypeptide sequence has been reported. 
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Primary Rank ^ Secondary Rank ^Tertiary Rank 




B CrylAb 

I — i CrylAe 

1 ^^ CrylAd 

' ^ CrylAc 

CrylFa 
CrylFb 
CrylGa 
CrylGb 

-r CrylDa 

■i CrylDb 

-f CrylHa 

-I CrylHb 

-i CrylEa 

4 CrylEb 

-g CrylJa 

4 CrylJb 

CrylCa 

4 CrylCb 

I CrylBb 

1 CrylBc 

■f CrylBd 

■I CrylBa 

I CrylKa 

I Cry 1 la 

I Cry lib 

Cry7Aa 

-I Cry7Ab 

-1 Cry9Ca 

4 Cry9Da 

CrY9Ba 
Cry9Aa 

CrySAa 

-5 CrySBa 

4, CrySCa 

-f Cry3Aa 

4 Cry3Ca 

4 Cry3Ba 

-k Cry3Bb 

Cry4Aa 

T! Cry4Ba 

■I CrylOAa 

Cryl9Aa 

Cryl9Ba 
Cry20Aa 
Cryl6Aa 
Cryl7Aa 
CrySAa 

15 Cry 5 Ac 

4 CrySAb 

4 CrySBa 

I Cryl2Aa 

1^ Cry21Aa 

i Cryl3Aa 

^ Cryl4Aa 

^ Cry2Aa 

-I Cry2Ab 

13 Cry2Ac 

I CrylSAa 

-p CryllBa 

CryllBb 
CryllAa 
CytlAa 

I CytlAb 

CytlBa 

Cyt2Ba 
Cyt2Bb 
Cyt2Aa 
CrylSAa 

■B Cry6Aa 

? Cry6Ba 

^ Cry22Aa 



Percent Amino Acid Sequence Identity 

FIG. 1. Phylogram demonstrating amino acid sequence identity among Cry and Cyt proteins. This phylogenetic tree is modified from a TREEVIEW visualization 
of NEIGHBOR treatment of a CLUSTAL W multiple alignment and distance matrix of the full-length toxin sequences, as described in the text. The gray vertical bars 
demarcate the four levels of nomenclature ranks. Based on the low percentage of identical residues and the absence of any conserved sequence blocks in 
multiple-sequence alignments, the lower four lineages are not treated as part of the main toxin family, and their nodes have been replaced with dashed horizontal lines 
in this figure. 



Main Cry 
Lineage 



had significant branch points deeper in the tree than the cho- 
sen primary rank in the nomenclature. This sort of analysis was 
rejected as unsuitable for the purposes of Cry nomenclature 
due to the generally ragged branch lengths it produced and the 
requirement for the careful choice of an outgroup. 
An alternative method of clustering protein sequences, ca- 



pable of handling sequences that are quite diverse, is parsi- 
mony analysis. A consensus tree generated from 100 boot- 
straps of such an analysis displaces the two incomplete Cryl 
sequences (CrylBd and CrylAf) and the two Cryl sequences 
lacking the C- terminal protoxin segments (Cryl la and Cry lib) 
into a region of the tree populated with such shortened se- 
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quences (not shown). With the further exceptions of Cryl2A 
being interjected into the Cry5 cluster and a number of se- 
quences besides Cry6B clustering higher in the tree than 
Cry6A, the proposed nomenclature successfully reflects the 
grouping of sequences provided by this method of analysis as 
well. 

As noted above, the usual distance metrics for phylogenetic 
analysis account for multiple substitutions per site; most com- 
monly, the Dayhoff PAM metric is used. When this distance 
metric was applied to the alignment used to make Fig. 1, a 
large number of the sequence pairs were found to have infinite 
distance. Therefore, the main Cry lineage and the Cyt lineage 
were separately aligned, the distances were calculated, and the 
distance matrices were clustered by using the FITCH program 
(of the PHYLIP software package). This method of analysis 
revealed several strongly associated groups of sequences 
(>90% of trees) in the main Cry lineage that extend deeper 
into the tree than the primary rank assigned in the proposed 
nomenclature: Cryl; Cry3; Cry4; Cry7; the Cry5, Cryl2-Cryl3- 
Cryl4-Cry21 group; the CTy8-Cry9 group; the Cryl0-Cryl9 
group; the Cryl6-Cryl7 group; and the Cry2-Cryll-Cryl8 
group. Many of these groups, however, were separated by 
branch points that were either nonmajority or were found 
<60% of the time; thus, the arrangement of these groups 
would be likely to change with additional sequence additions. 
At the secondary rank, the only anomaly with respect to the 
proposed nomenclature was the interjection of the Crylla and 
Cryllb sequences into the CrylB group. This effect may be due 
to an artificially reduced distance between the Cryll sequences 
and the incomplete CrylBd sequence caused by the particular 
distance metric used. The Cyt lineage sequences were sepa- 
rated into the expected two primary rank groups that separate 
into the expected secondary rank groupings. This more stan- 
dard phylogenetic approach also suffers from an accentuated 
visual disorientation of uneven branch lengths and shortening 
of the more closely related branches, especially at the tertiary 
rank (lowercase letter), where a great deal of comparative 
work has been done among the Cryl toxins. 

In summary, the proposed nomenclature uses readily avail- 
able software that can be easily interpreted by investigators in 
the field and meets their needs as well as, or better than, 
alternative methods of analysis and presentation. When the 
holotype toxins were analyzed by alternative phylogenetic 
methods, the hierarchy implied by the nomenclature was es- 
sentially consistent with the resulting phylogenetic clustering, 
and the few exceptions were largely explainable by known 
properties of the sequences in question. 
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